Indexing Documents by Discourse and Semantic Contents from Automatic Annotations of Texts
نویسندگان
چکیده
The basic aim of the model proposed here is to automatically build semantic metatext structure for texts that would allow us to search and extract discourse and semantic information from texts indexed in that way. This model is built up from two engines: The first engine, called EXCOM (Djioua et al., 2006), is an XML based system for an automatic annotation of texts according to discourse and semantic categories. The second engine called MOCXE uses automatic semantic annotation that is generated by EXCOM to create a semantic inverted index which is able to find relevant documents for queries associated with discursive and semantic categories such as definition, quotation, causality, relations between concepts, etc. We explain by an example of a relation of “connection” between concepts in French. The model used is enough general to be translated in other languages. General presentation Current existing web search engine systems that index texts generate representations as a set of simple and complex
منابع مشابه
Automatic Semantic Subject Indexing of Web Documents in Highly Inflected Languages
Structured semantic metadata about unstructured web documents can be created using automatic subject indexing methods, avoiding laborious manual indexing. A succesful automatic subject indexing tool for the web should work with texts in multiple languages and be independent of the domain of discourse of the documents and controlled vocabularies. However, analyzing text written in a highly infle...
متن کاملAutomatic Semantic Subject Indexing of Web Documents in Highly In ected Languages
Structured semantic metadata about unstructured web documents can be created using automatic subject indexing methods, avoiding laborious manual indexing. A succesful automatic subject indexing tool for the web should work with texts in multiple languages and be independent of the domain of discourse of the documents and controlled vocabularies. However, analyzing text written in a highly in ec...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملA Semi-Automatic Approach of old Arabic Documents Indexing
indexing is a largely used technique in retrieval systems. It has as goal to extract and to represent the meaning of a document so that it can be found by the user. We can cite two types of indexing: manual indexing, and automatic indexing. The automatic indexing requires to use character and words recognition engines which work only over the texts of contemporary documents. In this paper, we p...
متن کاملSemantic Indexing and Typed Hyperlinking
In this paper, we describe linguistically sophisticated tools for the automatic annotation and navigation of on-line documents. Creation of these tools relies on research into finite-state technologies for the design and development of lexicallyintensive semantic indexing, shallow semantic understanding, and content abstraction techniques for texts. These tools utilize robust language processin...
متن کامل